Using Hadoop To Implement A Semantic Method For Assessing The Quality Of Medical Data

ثبت نشده
چکیده

Recent technological advances in modern healthcare have lead to a vast wealth of patient data being collected. This data is not only utilised for diagnosis but also has the potential to be used for medical research. However, there are often many errors in datasets used for medical research, with one study finding error rates ranging from 2.3% to 26.9% in a selection of medical research databases. Previous methods of automatically assessing data quality have often relied on threshold rules. These rules can sometimes miss errors requiring complex domain knowledge to correctly identify. To combat this, a semantic framework has been developed to assess the quality of medical data expressed in the form of linked open data. Early work in this direction revealed that existing triplestores are unable to cope with the large amounts of medical data. In this thesis, a system for storing and querying medical RDF data using Hadoop is developed. This approach enables the creation of an inherently parallel framework that will scale the workload across a cluster. Unlike existing solutions, this framework uses highly optimised joining strategies to enable the completion of eight separate SPARQL queries, comprising over eighty distinct joins, in only two Map/Reduce iterations. Results are presented comparing both naı̈ve and optimised versions of the solution against Jena TDB, demonstrating the superior performance of the Hadoop system and its viability for assessing the quality of medical data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Sentiment Analysis of Social Networking Data Using Categorized Dictionary

Sentiment analysis is the process of analyzing a person’s perception or belief about a particular subject matter. However, finding correct opinion or interest from multi-facet sentiment data is a tedious task. In this paper, a method to improve the sentiment accuracy by utilizing the concept of categorized dictionary for sentiment classification and analysis is proposed.  A categorized dictiona...

متن کامل

Generating an Indoor space routing graph using semantic-geometric method

The development of indoor Location-Based Services faces various challenges that one of which is the method of generating indoor routing graph. Due to the weaknesses of purely geometric methods for generating indoor routing graphs, a semantic-geometric method is proposed to cover the existing gaps in combining the semantic and geometric methods in this study. The proposed method uses the CityGML...

متن کامل

Hospitals’ Readiness to Implement Clinical Governance

Background Quality of health services is one of the most important factors for delivery of these services. Regarding the importance and vital role of quality in the health sector, a concept known as “Clinical Governance” (CG) has been introduced into the health area which aims to enhance quality of health services. Thus, this study aimed to assess private and public hospitals’ readiness to impl...

متن کامل

BOOT-TS: A Scalable Bootstrap for Massive Time-Series Data

We propose a scalable method of assessing the quality of machine learning algorithms over sampled time-series data. While bootstrap provides a simple and powerful means of estimating accuracy, its application to large time-series data still suffers from scalability issues. As an alternative we introduce BOOT-TS, a scalable extension of bootstrap for time-series which utilizes the recent advance...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016